Network Density Estimation: A GIS Approach for Analysing Point Patterns in a Network Space

Abstract

    Human activities and more generally the phenomena related to human behaviour take place in a network-constrained subset of the geographical space. These phenomena can be expressed as locations having their positions configured by a road network, as address points with street numbers. Although these events are considered as points on a network, point pattern analysis and the techniques implemented in a GIS environment generally consider events as taking place in a uniform space, with distance expressed as Euclidean and over a homogeneous and isotropic space. Network-spatial analysis has developed as a research agenda where the attention is drawn towards point pattern analytical techniques applied to a space constrained by a road network. Little attention has been put on first order properties of a point pattern (i.e. density) in a network space, while mainly second order analysis such as nearest neighbour and K-functions have been implemented for network configurations of the geographical space. In this article, a method for examining clusters of human-related events on a network, called Network Density Estimation (NDE), is implemented using spatial statistical tools and GIS packages. The method is presented and compared to conventional first order spatial analytical techniques such as Kernel Density Estimation (KDE). Network Density Estimation is tested using the locations of a sample of central, urban activities associated with bank and insurance company branches in the central areas of two midsize European cities, Trieste (Italy) and Swindon (UK).

1 Introduction

    Many human-related events taking place in geographical space are referred to or constrained by network-led spatial configurations such as the road transport network at an urban or extra-urban level. This article considers density estimation of point patterns in a network rather than in a conventional, Euclidean continuous space.

    Point pattern analysis is one of the most used methods in spatial data analysis (Yamada and Rogerson 2003). The interest usually lies in the characteristics of the point pattern relative to some hypothesised process, such as the independent random (Bailey and Gatrell 1995) and the methods are used in many fields of research, including geography, economics, demography, criminology, ecology, epidemiology, and biology. Methods for analysing first-order properties, which describe the way in which the expected value of the spatial point pattern varies across space (intensity), are quadrat analysis and Kernel Density Estimation (KDE), while second-order properties are explored by means of other functions, as for instance nearest neighbour distances and Ripley’s K-function (Ripley 1976, 1981). The K-function is sometimes called the reduced second order measure (Bailey and Gatrell 1995), as it is designed to measure effects at different scales, those implying first order effects and second order trends, as local clustering or a general pattern over the region.

    Many phenomena can be considered as point processes and therefore studied by means of point pattern analysis. This is also the case for many human related phenomena that can be georeferenced as point events in space at different levels of precision, such as postcodes or address points, national grid references and latitude and longitude coordinates. Analyses of point distributions generally use algorithms and procedures that calculate Euclidean distances and consider space as continuous, homogeneous and isotropic. In analyses related to social and economic phenomena, however, this is a limitation, as many human-related phenomena are distributed over non-homogeneous spaces such as network-constrained structures. Residents’ locations, shopping centres and bank ATMs are based on street addresses, while ‘events’ such as robberies and car accidents take place on networks or are located close to them. In recent years researchers have attempted to consider networks in a GIS environment, with Batty (2005) in particular seeing this as one of the major issues in analysing spatial phenomena and representation in GIS. Miller (1999) notes that the assumption of a continuous planar space is too strong for analysing events that actually occur in a one-dimensional subset of this space, and Yamada and Thill (2004) recall the greater efficiency of shortest-path versus Euclidean distance measures. Some authors have proposed methods for analysing point patterns over network structures. This is the case of nearest neighbour distances on networks (Okabe et al. 1995), as well as the network K-function (Okabe and Yamada 2001) and its applications to events on networks such as car accidents (Yamada and Thill 2004, 2007). Several adaptations of these network-based methods to market area analysis have been performed. Harvey Miller has extended methods originally implemented on a Euclidean space to networks, as space-time accessibility measurements (Miller 1999) and the network Huff model of spatial interaction (Miller 1994). The latter was also examined by Okabe and Kitamura (1996) with a focus on market area analysis on networks, while Okabe and Okunuki (2001) implemented it in a GIS environment. More recently a series of spatial analytical tools for use in a GIS environment have been implemented to facilitate spatial point pattern analyses on networks (Okabe and Yoshikawa 2003; Okabe et al. 2006a, b).

    Little attention, however, seems to be paid to first order properties of point patterns in a network space, particularly in the analysis of the intensity of event features at a local level along a network. Only recently Borruso (2005) and Downs and Horner (2007a,b) have proposed applications of point pattern analysis adapting a network structure to a Kernel Density Estimation. They respectively considered the network density of intersections in the urban road network as an indicator of urban centrality in the first case, and networks of movement trajectories of animals to estimate their home ranges in the second case.

    This article presents a procedure for estimating the density of a point pattern of human activities over a network and compares it with a more traditional analysis in a Euclidean space. A method inspired by KDE, and particularly related to the na?ve estimator, called Network Density Estimation (NDE) is proposed and compared with the traditional, Euclidean method. In Section 2 some of the main features of point pattern analysis are briefly reviewed. Section 3 introduces network density estimation. In Sections 4 and 5 an application is carried out which examines the differences between traditional and network methods. The locations of bank and insurance branches in the central areas of two European cities (Trieste, Italy in Section 4 and Swindon, UK in Section 5) are used as a sample of city-centre (CBD) activities. Results from the two procedures are compared. Section 6 suggests future developments of the procedure while Section 7 contains concluding remarks.

2 Point Pattern Analysis and Density Estimation 

    When dealing with a point pattern, authors such as Gatrell et al. (1996) define events as the observed locations in a distribution, and points as all the other locations in the study area. Different levels of observations and analysis are proposed. The simple visualization of an event distribution over space by means of dot maps can provide initial information on the structure of the distribution, but more refined analytical instruments are needed for more in depth analysis, and particularly to identify clusters or regularity in the distribution relative to an assumed model, usually that of complete spatial randomness (CSR). 

    Quadrat analysis is one of the means of ordering the pattern of a distribution of events within a region R . It involves dividing the study region into sub-regions having equal and homogeneous areas as quadrats and then counting the number of events falling in each sub-region (quadrat) in order to simplify the spatial distribution. 

    The number of events therefore becomes an attribute of the quadrat. It is then possible to represent the spatial distribution by means of homogenous and easily comparable areas, as GIS packages allow visualizing the phenomenon via colour-thematic representation of quadrats. Density analyses are also easy to compute (Gatrell 1994, Bailey and Gatrell 1995, Gatrell et al. 1996). The method has some disadvantages, such as the loss of information from original data and the arbitrariness of the chosen quadrat shape, dimension, orientation and origin. Different analyses could be computed and results obtained, by changing the grid origin or dimensions. One improvement to these limitations involves considering the number of events for each area unit within a mobile ‘window’ of fixed radius centred at a number of points in the region R . An estimate of the intensity in each point of the grid is therefore provided. That generates an estimate of the variation of the intensity smoother than that obtained from a fixed grid of square superimposed cells. This method is the so-called 'naive' method of a group of procedures called Kernel Density Estimation (KDE). The kernel consists of a family of "moving three-dimensional functions that weight events within its sphere of influence according to their distance from the point at which the intensity is being estimated" (Gatrell et al. 1996). 

The general form of a kernel estimator is: 


    where λ(s) is the estimate of the density of the spatial point pattern measured at location s,si the observed ith event, k() represents the kernel weighting function and τ is the bandwidth. The KDE function allows one to estimate the intensity of a point pattern and to represent it by means of a smoothed three-dimensional continuous surface that represents the variation of density of point events across the study region. The procedure can be organized in three steps (Chainey et al. 2002): 

  1. A fine grid is placed over the study region and the distribution of events; 
  2. A moving three-dimensional function visits each cell and calculates weights for each point within the function's radius (threshold or bandwidth). In most of the kernel functions considered, events closer to the centre are given a higher weight than those located at the edge of the search function, therefore contributing more to the reference cell's density value; and 
  3. Grid cell values are calculated by summing the values of all surfaces for each location. 

    The routine therefore calculates the distance between each of the reference cells and the event's locations, evaluates the kernel function for each measured distance and sums the results for each reference cell (Levine 2004). 

    The function has many advantages if compared with other techniques, as it allows estimation of the density at any location in the study region (O' Sullivan and Unwin 2003) while preserving the total number of events. It allows also a field representation of the phenomenon by means of a smooth, three-dimensional continuous surface in which peaks represent the presence of clusters or 'hot spots' in the distribution of events. The arbitrary variables in the KDE are represented by the bandwidth (Gatrell et al. 1996) and by the grid cell size [1] . Different bandwidths allow analysis of the phenomena at different scales, as a wider bandwidth visualizes a more general trend over the study region and smoothing of the spatial variation of the phenomenon, while a narrower bandwidth highlights more local effects such as 'peaks and valleys' in the distribution. The choice of the bandwidth depends also on the size of the sample points, as sparser events generally are better evaluated using a larger bandwidth, since a narrower one will not provide much more information than the simple observation of event distribution in a dot-map or scatter plot. When the bandwidth is fixed the search radius is constant over the study region, but alternatively an adaptive bandwidth can be used (Silverman 1986, Brunsdon 1995), Most authors emphasize the fact that a bandwidth's choice is more important than choosing the weighting function, as the statistical results are not significantly affected by the various kernel functions (Epanechnikov 1969) [2] 

    Different weighting functions can also be used. Levine (2004) summarizes the five different weighting function superimposes a bell-shaped function over any location in the region, extending to infinity in all directions, therefore weighting all the points in the study region with closer points weighted more than distant ones. It is one of the most used functions in KDE, although many authors prefer a quartic function (Bailey and Gatrell 1995). The other four functions can be considered as circumscribed ones, as they strictly search for events within a radius (bandwidth) centred in each grid reference cell.

    KDE realised using a normal function appear smoother than those realised using the other functions. Quartic and uniform distributions tend to produce smooth surfaces from the data as well, although lower than that obtained via a normal function. Triangular and negative exponential functions produces more ‘spiky’ areas, emphasizing ‘peaks’ and ‘valleys’ of the data distribution.

    The choice of a particular function should therefore follow the user’s aim of highlighting a more general trend rather than finer variations (see also Atkinson and Unwin 1998), as well as the possible need of assigning different weights to near points over far points (Levine 2004).

3 Density Estimation on Networks: NDE

    KDE finds clusters in point pattern distributions over a study area, particularly highlighting ‘circular’ clusters. However, clusters can follow different distribution schemes in network-led spaces, for example, ‘quality’ shops in city centres tend to cluster along high streets while out-of-town ‘big box’ shopping centres are distributed along major roads, therefore forming ‘linear’ clusters.

    Network Density Estimation (NDE) involves a modification of the search function from one based on the Euclidean distance to a network-based one in which the bandwidths are calculated as shortest paths departing from each grid cell’s centre following the segments composing the network. Each search area therefore consists on a bandwidthdefined shortest-path tree and its bounding polygon, with shapes that vary with the network’s structure (Figure 1). Furthermore, the density analysis on a network involves the definition of a network space, that consists of the subset of the geographical space close to the network itself, and therefore that part of the geographical space that can be to some extent considered ‘interested’ by the presence of a network (i.e. in an urban environment, the pavements facing a road, the street fronts of buildings, etc.).

    In order to test the procedure and compare it with the traditional function based on Euclidean distance, an analysis is conducted using a network search function conceptually similar to the naive estimator and the uniform function used in KDE. In these latter functions, a moving window is placed over the study region, visiting each location for which a density estimate is required and counting the events falling inside. In NDE the moving window is not a circular one as in the naive estimator or in the family of kernel functions of KDE, but of a variable shape, as it is built as a network service area, therefore depending on the road network structure (see Figure 1 for the differences in the two search functions). In NDE a fine resolution grid is superimposed over the study region and grid cells' centroids are computed and used as reference locations for the density estimation. The point process is measured according to its belonging or not to the area defined by the bandwidth on the network, while no weighting functions are considered here when moving out from the reference cell's centre towards the service area's boundary. In this formulation of the NDE, points closer to a reference cell's centre are given the same importance of farther ones, as their contribution to the density function is given only by their belonging to each shortest path tree. The function applied here is not volume preserving, as in the Kernel Density Estimation where the aim is to obtain a smooth estimate of a univariate or multivariate probability density (Bailey and Gatrell 1995), or pycnophylactic (Tobler 1979): the NDE function produces a 'pure'  density value expressed in terms of both 'events per (linear) kilometre' and 'events per square kilometre' for each point (reference cell) at which the density is computed. The intensity value at every given location (reference cell) can be therefore obtained by counting the number of events belonging to each shortest path tree and both dividing this point count by the overall length of segments that compose the shortest path tree obtaining a linear density, and the point count by the service area surface bounding the shortest path tree (Fotheringham et al. 2002). 

Figure 1 Differences in search area between Euclidean and network. Thin black segments belong to the road network: (a) the dashed circle is the service area computed using a Euclidean distance from the reference cell’s centroid; the thick black segment is the search radius (bandwidth); and (b) in light grey the service area computed using a network distance; dark-grey segments belong to the shortest path tree; dashed line is the bandwidth computed as shortest path on the network

    As reported by Silverman (1986) the naive estimator is not fully satisfactory if compared to the more refined Kernel Density Estimator, where a continuous function is placed as a 'hump' on each reference cell, with the naive estimator giving a 'boxed' and ragged visualization of the density estimate. Technically the naive estimator is not continuous, nor does it integrate to unity, that are instead desirable properties maintained in Kernel Density Estimation. However, in the NDE applied here the 'box' effect is quite limited and a certain level of smoothing in the visualization is maintained, as the function varies continuously over the study region, moving from one cell to a contiguous one. 

    The following section outlines the workflow necessary to perform a network density estimation of a point process over a study area. The actions performed in the different steps can be realized using desktop GIS and standard spatial analytical tools. The first four steps involve the preparation of the dataset, while the density analysis itself is performed from Step 5. 

  • Step 1 – selection of the point pattern: This consists of point features (events) in a study area, such as the locations of ATM, retail stores, gas stations or places where car accidents, robberies or other events take place. The point pattern, however, must be considered in a network environment. 
  • Step 2 – generation of a regular grid over the study area: As in the quadrat count method and in traditional KDE, this step is necessary in order to define the starting point for each search function to be computed and estimate the density of events within its search radius. Grid cells represent the minimum units used for estimating the density. The centroid of each cell is derived as the reference location for the search function. 
  • Step 3 – selection of a network: A network layer is used for the computation of the distances from the reference cells. The point process considered is in fact constrained by the presence of the network and is located on it or in the neighbouring area of the network itself. 
  • Step 4 – assignment of events and cells to the network ('mirror points'): Point locations, such as events of the point pattern and grid cell centroids, are assigned to the network, thereby creating snapped mirror points to the nearest points on the network, in order to facilitate the network analyses. Step 4 can be organized in the two sub-steps Step 4a and Step 4b.
  • Step 4a – assignment of point pattern to the network ('mirror events'): Locations of features can be expressed as points, being considered either as address points, generally produced as point features by local authorities (i.e. municipalities) or by a process of geocoding using dynamic segmentation. In the first case, as address points are usually referenced to a building's shape, these can be located some distance from a street centreline, and therefore not lying on the network itself. As in the real world a street segment is characterized by a certain amount of width, an address point is considered as facing the street segment itself. In order to obtain a more precise computation, particularly in terms of an event belonging or not to the network, it may be necessary to create a set of 'mirror events', representing the mirror locations of the events snapped on the street network (Figure 2) [3]
  • Step 4b – assignment of cell centroids' locations to the network ('mirror cells'): As a network space can be considered as a subset of the Euclidean geographical space, an issue occurs when choosing the locations for estimating the network density function. The network density estimation could be calculated either in cells close to the network, and therefore accessible to it, or in all the cells belonging to the study region. In the first case only the cells that intersect the road network are selected among the entire set of cells covering the study area, snapping them to their mirror locations on the network's arcs. This is done in order to consider only reference locations on the network or close to it and also in order to limit the number of reference locations where the density function is calculated. A network space of cells selected from the original dataset is therefore considered, a solution that is also less computationally intensive than the following one. Such a network space could also be realized by performing a buffer zone around the network, to approximate the road widths, and after that performing a spatial intersection between the network buffer and the grid cells. This could be performed to consider a more realistic situation, in terms of road network structure and impedance 4 . In the second case, all the grid cells could be considered for estimating the network density through the overall study region. Their distance from the network segments should be calculated and their mirror locations on the network obtained. However, farther cells from the network should be weighted considerably less then closer ones, as they can be representative of locations inside a building or in a park, and therefore the 'effort' of reaching the network itself should be weighted [5]

Figure 2 Events and their mirror locations over a network. Light grey segments belong to the entire road network; the circular buffer is in dashed black line; the network service area is in medium grey; the shortest path tree is in dark grey. The light stars represent sample events while the dark stars are their mirror locations on the network. Note the locations of points A and A′. Point A is included in the network service area but it is not reachable through the shortest path tree from point O. Its mirror location A′ is not reached by the shortest path tree and therefore is not selected for the density estimation

  • Step 5 – definition of a bandwidth: The choice of a bandwidth is of crucial importance for density estimation. It is the search radius for the density function. Different methods have been proposed by authors for its definition [6] , in most cases leaving the researcher to experiment with different values of the bandwidth to look at the variation of the function at different scales (Bailey and Gatrell 1995). The calculation of the bandwidth here is performed over the network rather than in the Euclidean space, with the bandwidth measured from the cell's centroid along the arcs on the network. Only the events within that distance on the network are used. The methods used to define the bandwidth in a Euclidean space should therefore be adapted to the network configuration of space. 
  • Step 6 – computation of shortest path trees and service areas from each reference point: A shortest-path tree is calculated starting from each cell's centroid along the network's arcs, according to the search bandwidth distance selected. A metric distance is calculated along the arcs belonging to the network. More sophisticated analysis could consider directional constraints on a road network, speed limitations or other types of transport costs related to the different arcs. A network-service area is also derived from the shortest path tree, consisting of the bounding polygon of every shortest path tree. The overall lengths of the shortest path trees and the areas of the bounding polygons are also calculated. 
  • Step 7 – overlay, intersection and computation of density: Network-service areas and shortest path trees realised from each reference cell are overlaid on the (mirror) point process and used to estimate the density values of events on the network. This is accomplished by counting the number of events, represented by their mirror events, intersecting each shortest path tree within the given (bandwidth) distance from each reference cell. Events are summed and assigned to the reference cell [7] . In case the events belonging to the point process represent some intensity value at the different locations, a weighted sum can be also performed. In this formulation of the NDE estimate no 'distance decay' function is considered when visiting the events and therefore only their belonging or not to the shortest path tree is considered. The density computation is therefore conceptually similar to the naive density estimator. The number of events can be divided by the overall length of each shortest-path tree, therefore providing a linear density estimation of the events' distribution. This density value is then assigned to each reference cell as an attribute [8] . Alternatively, the number of events falling on each of the shortest-path trees is divided by the service area surface, therefore obtaining a density estimate expressed in terms of events per square kilometre. This would allow a more direct comparison to other area-based density estimation approaches like the original KDE. 
  • Step 7bis – computation of weighted distance of events from reference cells' centroids: For each event location it is calculated as the distance from each reference cell within the bandwidth. Doing this allows the weighting of events within the bandwidth differently according to their distance from the cell's centroid. Closer events are weighted more than farther ones. It is therefore possible to calculate a density function similar to a KDE one. 
  • Step 8 – visualization of the density surface: The results obtained can be visualised in a GIS, coding the density value by colour. The resulting dataset is however a subset of the overall study region organized as a 'grid interpretation of the data' (Levine 2004) and its visualization will contain blank areas resulting from the cells not belonging to, or close to, arcs in the network [9] .

4 Network Density Estimation for the Distribution of Bank and Insurance Branches in the Municipality of Trieste 

    These ideas have been tested using a dataset consisting of the locations of bank branches and insurance companies in the central area of the Municipality of Trieste (Northeastern Italy, Figure 3a) in order to highlight clusters of financial services and test for the existence of a 'financial district'. Banks and insurance companies are among the main human activities considered in urban geography's studies for the definition of the Central Business District (Murphy and Vance 1954b). Their concentration is generally high in central areas and usually coupled to land values higher than in less central and more peripheral parts of a city. The land value function as well as the density functions are generally decreasing as the distance from the city centre increases and can present variations and lower intensity peaks located at minor settlements and at the intersection of major arterial roads. Murphy and Vance (1954a) in particular recalled the concentration of central activities to define and delineate the shape and extension of the CBD in urban areas. The use of density estimators could be useful to recall such research, and particularly network-based ones, given the more realistic approximation of the urban structure they allow [10]

    The urban road network was drawn in the study area, together with the point dataset represented by the location of bank and insurance companies. The analysis used a point dataset consisting of 109 events representing bank branches and insurance companies. These data have been collected from Italian Yellow Pages and georeferenced to address points. A grid consisting of 48 rows and 87 columns of 20 m cells was superimposed over the study region, covering an area of 1,670,400 m2 and a road network of 512 arcs for a total length of 34,470 m was also used (Figure 3b). 

Figure 3 (a) The municipality of Trieste with road network (light grey), bank and insurance companies (black dots) and the position of the study region (black box); (b) the study region. The grid cells are displayed as reference locations for the density analysis (light grey grid), as well as the urban street network (dark grey lines) and distribution of banks and insurance companies (black dots). Centroids for three selected reference cells are displayed (grey dots) as well as their search functions considering the Euclidean distance (thick dashed grey circles) and network distance on shortest path tree (thick dashed black line polygons); and (c) the study region with shortest path trees for three selected reference cells (thick black lines), distribution of banks and insurance companies (black dots) and their mirror locations on the network (grey dots)

    The analysis was performed following the steps described above and therefore shortest path trees and network-service areas from each reference cell were drawn [11] using a 125 m bandwidth [12] . This distance was used after several simulations testing alternatives. 

    The choice of a 125 m bandwidth followed considerations on the micro, urban scale of the analysis and on the aim of observing the distribution of banks and insurance branches at a very detailed level. It was observed that at this scale of the analysis a bandwidth lower than 125 m produced a too 'spiky' representation of the phenomenon, providing, in extreme cases, not much more information than the simple observation of the point distribution. On the other hand, bandwidth values higher than 250 m caused an excessive dilution of the spatial pattern. It is worth noting that the study region considered here is an area of high concentration of banks and insurance companies as well as of other 'central' human activities in the municipality of Trieste. A traditional 400 m bandwidth quartic Kernel Density Estimation previously performed over the entire area of the municipality highlighted a single peak in the density function of such activities in the central area of the city (Borruso 2006), which is under investigation here [13] . A narrower bandwidth therefore allows a more in depth local analysis of such peak area. 

    Shortest path trees were used to count the number of banks and insurance companies lying on the network within the 125 m bandwidth distance [14] and the count values were assigned as attributes of the reference cells. Counts were divided by each overall shortest path tree length to obtain relative densities in terms of events per linear kilometre for each reference cell. The 20 m grid cells used represent a discrete approximation of a continuous space and did not conflict with the 125 m bandwidth. Different authors stress the importance of the bandwidth rather than the grid cell size, as the latter is mainly representative of a finer or coarser resolution of the three-dimensional density function obtained. In fact, de Smith et al. (2007) point out that "the grid resolution does not affect the resulting surface form to any great degree", while O' Sullivan and Wong (2007) recall that a grid resolution substantially smaller than the bandwidth by a factor of five or more and minimally by a factor of two affects the density estimate negligibly. 

    These values were then interpolated in order to obtain a continuous surface to be represented both as a traditional two-dimensional graph and as a three-dimensional map. Inverse Distance Weighting was used to interpolate the density values, using actual distance and relying on spatial resolutions of 20 m for the computed grid. The interpolator was given a power of 1, which provided a better smoothing and shaping of the density distribution. This was done after testing different values, such as 2 and 3, that did not produce relevant differences in the overall shape of the density surface, although the resulting distribution was spikier. 

    To accomplish this, the reference cells densities and locational coordinates were exported from the GIS environment and processed using a surface interpolating software [15] . Improvements to the density estimation, such as adaptive bandwidths and edge-effect corrections, as suggested by some authors (Silverman 1986, Bracken 1994) were not considered at this stage. 

    Figures 4 and 5 show three- and two-dimensional visualisations of the network density analysis. Peaks can be noticed mainly in the centre-west and south-west of the study area. These peaks correspond to the city centre where a higher clustering of events, represented by higher values of the density surface, is observed. There are two main clusters that can be identified in the south-west part of the study region where the highest densities are found. Other minor clusters are visible centre-east of the study region, with few cases along main roads. It is also worth noting that a series of small elongated clusters following a north-northeast-south-southwest line in the western part of the study region can also be seen. This corresponds to a seaside major road that bounds the study region on one side of which several banks and insurance companies are located. 

Figure 4 NDE on bank and insurance branches, 125 m bandwidth [linear density] (3D)


Figure 5 NDE on bank and insurance branches, 125 m bandwidth [linear density] (2D)

    A similar network analysis was performed to produce a density estimation of events per square kilometre of the road network, which consists on dividing the number of events falling within the 125 m bandwidth computed over the network and dividing the value by the overall area of the service area for each reference point. As in the previous case, the single cell values were interpolated in order to obtain a continuous surface. The results are displayed in Figures 6 and 7. 

Figure 6 NDE on bank and insurance branches, 125 m bandwidth [area densities] (3D)

Figure 7 NDE on bank and insurance branches, 125 m bandwidth [area densities] (2D)

    It is clear that normalising the values by the network length rather than by the service areas in the kernel does not greatly affect the overall shape of the density surfaces, with very similar clusters identified in the two analyses. The derived density values are different in absolute terms and the linear network density appears to be a little smoother than the area normalised one, although the main clusters of events can be easily recognized in the south-west and centre-west parts of the study region. Area-NDE can be used for direct comparison with more traditional KDE while Linear-NDE is more consistent with transport analysis where density values are expressed in events per linear km. 

Figure 8 Uniform KDE on bank and insurance branches, 125 m bandwidth (3D) 

Figure 9 Uniform KDE on bank and insurance branches, 125 m bandwidth (2D)

    These NDE results can be compared to the conventional KDE that inspired this kind of analysis. A Uniform KDE was computed using the same dataset of events using standard spatial statistical software[16]. The same 20 m cell resolution was maintained as was the 125 m bandwidth, which now defines a straight-line Euclidean distance. 

    For comparison purposes, the simple naive estimator was used instead of more complex kernel functions. Figures 8 and 9 shows the results of KDE in three- and two-dimensional visualisations. 

Table 1 Comparison between Network Density Estimator using area and linear densities and uniform Kernel Density Estimator (Trieste, Italy)


NDE (linear density)events per linear kmNDE (service area density) events per sq kmKDE (uniform) events per sq km
max
17.12
511.13397.33
min0.86
31.3920.91
mean4.26
132.2198.78
st. dev
2.81
86.3682.25
not null cells
1,748
17482779


    The KDE highlights peaks in the distribution in the same areas as in the Network Density Estimation. Nevertheless, peaks in the south-west part of the study area are merged to form a single elongated cluster, less consistent with the road network orientation and shape, and other areas look denser. The two clusters in the central part of the study region seem to follow an arched shape oriented north-south and a minor cluster appears in the eastern part. If compared to the Network Density Estimation, with this latter estimator the two clusters in the central part of the region seemed oriented differently – the southernmost of the two clusters is oriented along a major road that starts from the centre of the study region and follows a west-southwest-east-northeast orientation together with other two minor clusters along the same road. 

    It is also worth noting that in KDE analysis there is less marked evidence of a linear cluster along the major seaside road as highlighted with NDE, being here substituted by some light circular clusters. 

    The density values expressed as number of events per linear kilometre obtained performing NDE are generally higher than those for KDE (Table 1). Similarly mean and standard deviation are higher in NDE while the numbers of null cells obtained from the two analyses are quite different, with NDE presenting a considerably lower number of 'not null' cases. In the NDE analyses the cells considered are only those close to the network itself, in this case within 20 m of the network, and in any case not farther than the 125 m bandwidth, as the density analysis produces results limited to the network subset of the study region's extension. Cells that are outside of these 'network ranges' are therefore assigned a null value as not or poorly accessible from the network itself. The table also reports density values expressed in terms of area density (events per square kilometre). A direct comparison in terms of density value between uniform KDE and linear NDE is not possible apart from the visual impact of the two distributions, although a comparison can be made considering the areal NDE, as the visual results from the two different versions of NDE (areal vs. linear) are not very different from each other. Although NDE is performed both in terms of linear and areal densities, however, given the linear, network oriented approach behind NDE, linear densities seem to be better representing the 'philosophy' of a network-driven analysis. However, it can be useful to consider both kinds of NDE, particularly when comparing different networkconstrained environments, as different cities or parts of a same city are characterized by different network structures. The comparison between linear and areal network densities in two different network environments can in fact help in better understanding the spatial distribution of events and whether there are the conditions for the existence of linear clusters or where 'traditional' circular clusters dominate. 

Figure 10 (a) The Borough of Swindon with road network (light grey), bank and insurance companies (black dots) and the position of the study region (black box); (b) the study region. The grid cells are displayed as reference locations for the density analysis (light grey grid), together with the urban street network (dark grey lines) and distribution of banks and insurance companies (black dots) as well as their mirror locations (grey dots)

    Differences between the two analyses are, however, not very marked, although NDE seems to be more proficient than KDE, with a given bandwidth, in highlighting clusters at the local level. In the case study presented, the general lack of neat differences between the two analyses could be a consequence of the characteristics and orientation of the street network in the urban area considered, where a Manhattan-like structure dominates and therefore a high regularity in the street network pattern can be identified. However, where few long roads dominate the network structure and events are distributed along it the Network Density Estimator allows the visualization of linear clusters along the network. A first conclusion that could be drawn is that in urban areas where the network structure is particularly compact and events are intensively distributed in space, the differences between KDE and NDE are minimal while a more consistent performance of NDE is evident in areas where major streets or roads – as high streets or main roads connecting the central part of a city to the outer parts – shape the network structure scheme and with the distribution of point events organized around them.

5 Banks and Insurance Companies in the Borough of Swindon

    In order to compare the results from the NDE, the procedure was tested on a different urban area using data of similar nature. In particular the area considered was the urban area of the Borough of Swindon (UK), where banks and insurance companies locations were considered within a study region selected in the city centre (Figure 10a). The area was chosen because of its similar size to Trieste, both in terms of population and area, although banks and insurance companies seem less present and with a lower concentration in Swindon than in Trieste. The urban road network has been drawn in the study area with data extracted from an OSCAR dataset (Ordnance Survey ?), together with a point dataset of 32 bank and insurance companies locations. Data have been collected from UK Yellow Pages and georeferenced to unit postcodes. A grid consisting of 70 rows and 67 columns of 20 m cells was superimposed over the study region, covering an area of 1,876,400 m2 and a road network of 374 arcs for a total length of 26,519 m was also used (Figure 10b). 

Figure 11 NDE on bank and insurance branches, 125 m bandwidth [linear density] (3D)

    The analysis was conducted following the same steps as in the case of Trieste. The same 125 m bandwidth was used on the Swindon dataset in order to rely on a similar scale of analysis for the density estimation. The density estimation highlights the presence of two sub-regions of the study area where clustering of banks and insurance companies branches takes place. These areas are located respectively in the north-western and south-eastern corners of the study area. In the north-western part of the region it is possible to note a main cluster on the west and two other clusters along a main road, the northern one being more elongated along a northwest-southeast road and the southern one located at a crossing between two main roads. In the southern area other clusters can be identified. The peaks in the distribution are less evident if compared to those located in the northern part, although here also a similar pattern can be noticed, particularly with a more evident long cluster towards the south-eastern corner of the study region presenting an elongated shape along a northwest-southeast oriented road. A first analysis was conducted considering density expressed as events per linear km (Figures 11 and 12), with shortest path trees computed for each reference cell used to count the events intersecting them and then dividing the count by the overall shortest path trees lengths. 

Figure 12 NDE on bank and insurance branches, 125 m bandwidth [linear density] (2D)

    A second analysis produced a density estimate expressed in events per square kilometre (Figures 13 and 14). The results obtained do not differ very much in the two analyses but the 'linear density' analysis seems to be more suitable in highlighting clusters elongated along major streets or roads out from the other clusters. The density analysis carried on in the case of Swindon relied on a less populated dataset of banks and insurance companies. Although the size of the area as well as the overall length of the street network was quite similar to the ones in the case of Trieste, the number of events was nearly one third less than in the Italian case. 

    A uniform Kernel Density Estimation was performed over the banks and insurance companies of the central area of Swindon in order to compare the results from the Network Density Estimation (Figures 15 and 16). The same parameters used for the NDE on Trieste and Swindon were used both in terms of bandwidth (125 m) and spatial resolution of the interpolation algorithm performed after the density estimation itself (20 m cell size and IDW powered to 1) for the three-dimensional visualization. Main clusters are visible in the same area highlighted by the Network Density Estimation, therefore in the north-western and south-eastern parts of the study region. In this case, however, mainly circular clusters are visible, with two groups of three different-sized clusters in the two sub-regions. These clusters can be compared with those obtained via NDE and it can be noticed that the uniform KDE highlights these circular shapes also where NDE shows elongated clusters oriented along streets and roads. 

Figure 13 NDE on bank and insurance branches, 125 m bandwidth [area densities] (3D)

Figure 14 NDE on bank and insurance branches, 125 m bandwidth [area densities] (2D)

Figure 15 Uniform KDE on bank and insurance branches, 125 m bandwidth (3D)

    As in the analysis performed on Trieste data, density values obtained from linear and areal NDE were compared to those obtained via a uniform KDE (Table 2). Comparisons between the results can be made in two directions, first considering the differences in density values in Swindon after performing areal NDE and uniform KDE and then comparing the linear NDE in the two cases of Trieste and Swindon. 

    Areal NDE applied in Swindon highlights very high values in terms of absolute densities (maximum and minimum values) and mean and standard deviation, therefore limiting, in the NDE case, the dilution of density that is higher in KDE. Although the sample is less numerous than in the case of Trieste, when applied to the Swindon' study region NDE seems to be more proficient in highlighting 'hot spots' of clusters particularly located along arcs of streets and roads. This might also be a consequence of the network structure of the study region considered, where there is a certain dominance of major streets and roads leading the spatial pattern of the network and therefore the spatial distribution of the point events. 

Figure 16 Uniform KDE on bank and insurance branches, 125 m bandwidth (2D)


Table 2 Comparison between Network Density Estimator using area and linear densities and uniform Kernel Density Estimator (Swindon, UK)



NDE (linear density) events per linear kmNDE (service area density) events per sq kmKDE (uniform) events per sq km
max12.50971.68207.36
min1.0532.6020.74
mean3.58119.7079.21
st.dev1.9598.5154.56
not null cells4394391010



    When comparing the results obtained by applying the linear NDE to Trieste and Swindon it can be seen that the results are quite similar in terms of maximum and minimum density values as well as mean and standard deviation, although both the number of events and not null cells are less numerous in Swindon than in Trieste. Areal NDE, however, presents higher values in Swindon than in Trieste while the opposite is true for uniform KDE. Differences in the network structures of the two study regions play an important role in these results, as well as the different number of events belonging to the two sample datasets. The study region in Trieste is dominated in a huge part by a Manhattan-like network structure, with several banks and insurance companies distributed in the central area of the city in a sort of 'financial district'. The 'compactedness' of the street network and events distribution makes it more difficult to differentiate the results from NDE and KDE, although linear clusters along major roads can be noticed. In the study area of Swindon it is more difficult to highlight a single 'financial district' given by a pure concentration of banks and insurance companies. Both the point events dataset and the network structure are less compact, with banks and insurance companies located quite clearly along a few streets that appear as the 'backbone' of the network structure of the area considered. In such a network structure it is therefore easier to experience different results in terms of density values and linear clusters from NDE and uniform KDE and therefore confirm the suitability of NDE for such analyses.

6 Future Developments 

    The NDE is not an alternative to KDE but a network-led integration of this analysis for understanding human phenomena in the urban and extra-urban environment. Further implementations could consider the restrictions on road networks as well as different weighting of arcs belonging to the network in order to consider different cost functions attributed to them or morphological impediments and individual perceptions of the network. 

    Further research is needed to examine different network configurations and characteristics, in different cities with less well-structured street networks and using more refined search functions, including weighting schemes for the events. Further developments might also include procedures for defining bandwidths for Network Density Estimation. As in other research on networks, the procedures to define the bandwidth should consider the structure of the network where the analysis is carried out. The application of methods based on the intra-events distance, as in the case of nearest neighbour functions, should therefore be based on the network structure. The purpose of the research presented here is trying to link first and second order properties of point patterns in a network-constrained environment. A final note regards the functions to be used for performing the Network Density Estimation, as further developments in the NDE are needed to consider different distance-based weighting functions in a network environment similar to the normal, quartic or triangular ones already implemented in kernel density analysis. 

7 Conclusions 

    In this article network spatial analysis was considered with particular reference to the first order effects of events distribution over a study region characterized by the presence of a street network. As human activities usually take place in network-organized spaces, there is a need to refine the search functions of traditional methods for analysing regularity or clustering in the distribution of events, using network distances rather than Euclidean ones. The procedure presented is called Network Density Estimation (NDE) and is inspired by Kernel Density Estimation (KDE) as a method for analysing the local spatial distribution of event processes placed on a network in a study region. The main characteristic of the function is that of considering shortest-path trees rather than circular search functions for a density analysis, therefore computing only events that can be reached along the network’s segments. 

    A density analysis can be used to determine the concentration of ‘central’ human activities in an urban environment, therefore allowing the visualization of denser areas of activities and helping in the analysis of the Central Business District, by means of a three-dimensional surface that gives also the gradient of urban density functions that decrease from the central areas of the city in terms of urban land use and its value (Knos 1962, Haggett 2001) and the population distribution (Yeates and Gardner 1976). Given that the structure of urban areas is to some extent delineated by the orientation of the street and road network pattern, the consideration of a 'network space'  consisting on the subset of the geographical space close to the network itself, can be useful in understanding the spatial organization of some human activities. From this vantage point, a network-based density estimator seems to be promising in exploring issues related to the urban forms and functions. 

    The method can be implemented using standard GIS and spatial statistics functions and has been illustrated using two datasets consisting of banks and insurance companies as samples of CBD activities (Murphy and Vance 1954b) located in the central areas of the cities of Trieste, Italy and Swindon, UK. The analysis was performed using NDE and a more traditional KDE in order to provide comparisons between the two methods. Comparisons were also made between two flavours of Network Density Estimation, these being a linear density estimation and an area one. Although differences between the results from the two analyses obtained using NDE are not very high when compared with conventional KDE, NDE seems to be more proficient in highlighting 'linear' clusters oriented along a street network. 

    In the case studies KDE and NDE do not differ very much, although NDE is more promising than KDE where a network structure is led by some major roads that guide the development of activities (i.e. shops in high-streets, out-of-town retail activities) and point events tend to be distributed along them. In such cases KDE would create elongated clusters only when events are very close to each other, with sparser events, even along the road, to form circular clusters. NDE can also handle events distributed in a compact network environment and is better suited in highlighting linear clusters, as it draws a clustering pattern more consistent with a network orientation. In particular, NDE and KDE provide similar results when a network structure is quite compact and regular, what happens generally in the central parts of grid-based cities, and point events are quite evenly distributed in the area. Higher differences are visible when a marked influence of a single road or a set of major roads are present and a number of events is distributed along them. The procedure needs to be tested on different datasets of point events that take place on networks, and comparisons made of different urban areas and different parts of the same urban areas to analyse the effects of the different network structures on the estimator. 

    In this article the method implemented recalled the more general naive estimator, and therefore did not consider a 'distance decay' effect for events farther from the estimation points. Although such a naive-like Network Density Estimator tends to provide less smoothed surfaces and not consider the effect of distance on the density estimate, it provides first elements for comparison with more traditional Euclidean density estimators. 

    Further developments of the estimator should therefore consider the application of different functions on the network-constrained environment (i.e. quartic or normal kernels) in order to enhance the performance of the estimator in identifying clusters along networks and to weight close and distant events in different ways. That would allow a better visualization of the differences of clustering in the Euclidean and network environments.

Notes 

  1. The grid used is a proxy for a continuous space and its cell’s size represents the resolution of the density surface obtained as well as the level of detail to be presented. 
  2. Different methods have been proposed by authors for its definition. Operationally it has been often suggested to test different bandwidths according to the data distribution, particularly on the size of the study area or the number of points, and according to a researcher’s aim and scale of observation. Bailey and Gatrell (1995) suggest a rough starting choice of τ = 0.68n-0.2 where n is the number of events in R and R is the unit square. Recently some authors (e.g. Williamson et al. 1999) have proposed a k-nearest neighbour approach, where the bandwidth is related to the mean nearest neighbour distance for different orders of k. It is based on the interpoint distances of the point pattern. This allows the user to control the degree of smoothing by means of the order k. The method has also been developed by Chainey et al. (2002). 
  3. Events and cell centroids located at a certain distance from the network can be assigned to the network itself, creating mirror reference locations on the network. Such operations are possible by means of the SANET software, a set of tools for network analyses for GIS software implemented by Okabe et al. (2006a). See also Okabe et al. (2006b) and Okabe and Yoshikawa (2003) for the application of SANET / SAINF extensions to GIS software for computing nearest neighbour distances of events on networks as well as second order point pattern analysis on a network. 
  4. This operation would include a wider number of cells as close to the network, but also introduce the network buffer width as another parameter to be estimated. 
  5. In doing that, grid cells should be selected after the definition of the bandwidth, as cells farther from the network than the bandwidth should not be considered and assigned a null value, while the other cells within the bandwidth distance from the network should be considered and weighted according to their distance from the network, also including a higher impedance factor representative of the ‘effort’ of crossing elements not on a network, as buildings, parks, etc. 
  6. See note 3 above for some techniques used for defining bandwidths. 
  7. A ‘point in polygon’ operation, considering network service areas over events’ distribution, could have also been considered for this step. However, this operation does not ensure that just events actually on the network are selected. Therefore it was preferred to assign mirror events on the network and rely on the ‘line intersection’ operation to select only those events actually belonging to the shortest path tree. 
  8. This represents a modification of the density analysis proposed using KDE, where density is computed with reference to the grid cell. The different approach in NDE is chosen to emphasize the role of network-service areas when computing densities. 
  9. In order to visualize the density function in three dimensions, a further interpolation of the dataset can be performed in order to produce a smoother visualisation of the density function, as is possible in packages such as Golden Software Surfer ?. In this example Inverse Distance Weighting (IDW) is used to produce a surface view of the density estimation obtained. 
  10. In their analysis, Murphy and Vance (1954b) anticipated the capacity of the street and road network to drive the orientation and shape of a Central Business District. 
  11. The network analysis from the reference cell’s centroid was performed using the ‘Network Analyst’ extension in ESRI ArcGISTM 8.3. Shortest-path trees and annexed bounding polygons from each reference cell were computed using this extension. The software SANET developed by Okabe et al. (2006a) was used for the realization of the mirror locations of both events and cells. Other computations in the GIS environment were performed using Intergraph GeoMedia ProfessionalTM 5.2 and 6.0. 
  12. The same 125 m bandwidth was used in both the NDE and KDE analyses performed on the dataset. 
  13. In other analysis where local effects were analysed in an urban environment, a 300 m bandwidth was used (Thurstain-Goodwin and Unwin 2000). 
  14. The mirror locations on the network of banks and insurance companies were used in this operation, as original locations fall at a certain distance from the network, while the mirror events fall on the network itself, allowing a ‘line intersection’ operation to select those actually belonging to the shortest-path tree. 
  15. Results were processed using Golden Software SurferTM 8.0. 
  16. CrimeStat III (Levine 2006) was used to perform a uniform Kernel Density Estimation. The results were expressed as ESRI shapefiles as well as Golden Software dat files for threedimensional visualization.

References 

  1. Atkinson P J and Unwin D J 1998 Comparisons and problems in applying density estimation techniques to the distribution of hepatitis A. Geographical Systems 5: 301–12
  2. Bailey T C and Gatrell A C 1995 Interactive Spatial Data Analysis. Harlow, Longman 
  3. Batty M 2005 Network geography: Relations, interactions, scaling and spatial processes in GIS. In Unwin D J and Fisher P (eds) Re-presenting Geographical Information Systems. Chichester, John Wiley and Sons: 149–70 
  4. Borruso G 2005 Network Density Estimation: Analysis of point patterns over a network. In Gervasi O, Gavrilova M L, Kumar V, Laganà A, Lee H P, Mun Y, Taniar D, and Tan C J K (eds) Computational Science and Its Applications (ICCSA 2005) Berlin, Springer Lecture Notes in Computer Science No 3482: 126–32 
  5. Borruso G 2006 The role of cartography in defining the Central Business District: A methodological approach. Bollettino dell’Associazione Italiana di Cartografia 126-127-128: 271–87 (in Italian) 
  6. Bracken I 1994 Population-related social indicators. In Fotheringham A S and Rogerson P (eds) Spatial Analysis and GIS. London, Taylor and Francis: 247–59 
  7. Brunsdon C 1995 Analysis of univariate census data. In Openshaw S (ed) CensusUsers Handbook. Cambridge, GeoInformation International: 213–38 
  8. Chainey S, Reid S, and Stuart N 2002 When is a hotspot a hotspot? A procedure for creating statistically robust hotspot maps of crime. In Kidner D, Higgs G, and White S (eds) Socio-Economic Applications of Geographic Information Science. London, Taylor and Francis: 21–36 
  9. Downs J A and Horner M W 2007a Characterising linear point patterns. In Proceedings of the GIScience Research UK Conference (GISRUK), Maynooth, Ireland 
  10. Downs J A and Horner M W 2007b Network-based kernel density estimation for home range analysis. In Proceedings of the Ninth International Conference on Geocomputation, Maynooth, Ireland 
  11. Epanechnikov V A 1969 Nonparametric estimation of a multivariate probability density. Theory of Probability and Its Applications 14: 153–8 
  12. Fotheringham A S, Brunsdon C, and Charlton M 2000 Quantitative Geography, London, Sage 
  13. Gatrell A 1994 Density estimation and the visualisation of point patterns. In Hearnshaw H M and Unwin D (eds) Visualisation in Geographical Information Systems. Chichester, John Wiley and Sons: 65–75 
  14. Gatrell A, Bailey T, Diggle P, and Rowlingson B 1996 Spatial point pattern analysis and its application in geographical epidemiology. Transactions of the Institute of British Geographers 21: 256–74 
  15. Haggett P 2000 Geography: A Global Synthesis. Harlow, Pearson Education 
  16. Knos D S 1962 Distribution and Land Values in Topeka, Kansas. Lawrence, KS, Bureau of Business and Economic Research 
  17. Levine N 2004 CrimeStat III: A Spatial Statistics Program for the Analysis of Crime Incident Locations. Washington, DC, National Institute of Justice 
  18. Levine N 2006 Crime mapping and the CrimeStat program. Geographical Analysis 38: 41–56 
  19. Miller H J 1994 Market area delimitation within networks using Geographic Information Systems. Geographical Systems 1: 157–73 
  20. Miller H J 1999 Measuring space-time accessibility benefits within transportation networks: Basic theory and computational methods. Geographical Analysis 31: 187–212 
  21. Murphy R E and Vance J E 1954a Delimiting the CBD. Economic Geography 30: 189–222 
  22. Murphy R E and Vance J E 1954b A comparative study of nine central business districts. Economic Geography 30: 301–36 
  23. Okabe A and Kitamura M 1996 A computational method for market area analysis on a network. Geographical Analysis 28: 330–49 
  24. Okabe A and Okunuki K 2001 A computational method for estimating the demand of retail stores on a street network and its implementation in GIS. Transactions in GIS 5: 209–20 
  25. Okabe A and Yamada I 2001 The K-function method on a network and its computational implementation. Geographical Analysis 33: 271–90 
  26. Okabe A and Yoshikawa T 2003 SAINF: A toolbox for analyzing the effect of point-like, line-like and polygon-like infrastructural features on the distribution of point-like non-infrastructural features. Journal of Geographical Systems 5: 407–13 
  27. Okabe A, Okunuki K, and Shiode S 2006a SANET: A toolbox for spatial analysis on a network. Geographical Analysis 38: 57–66 
  28. Okabe A, Okunuki K, and Shiode S 2006b The SANET toolbox: New methods for network spatial analysis. Transactions in GIS 10: 535–50 
  29. Okabe A, Yomono H, and Kitamura M 1995 Statistical analysis of the distribution of points on a network. Geographical Analysis 27: 151–75 
  30. O’Sullivan D and Unwin D J 2003 Geographic Information Analysis. Chichester, John Wiley and Sons 
  31. O’Sullivan D and Wong D W S 2007 A surface-based approach to measuring spatial segregation. Geographical Analysis 39: 147–68 
  32. Ripley B D 1976 The second-order analysis of stationary point process. Journal of Applied Probability 13: 255–66 
  33. Ripley B D 1981 Spatial Statistics. Chichester, John Wiley and Sons 
  34. de Smith M J, Goodchild M F, and Longley P A 2007 Geospatial Analysis: A Comprehensive Guide to Principles, Techniques and Software Tools (Second Edition). Leicester, Troubadour 
  35. Tobler W R 1979 Smooth pycnophylactic interpolation for geographical regions. Journal of the American Statistical Association 74: 121–7 
  36. Thurstain-Goodwin M and Unwin D J 2000 Defining and delimiting the central areas of towns for statistical modelling using continuous surface representations. Transactions in GIS 4: 305–17 
  37. Silverman B W 1986 Density Estimation for Statistics and Data Analysis. London, Chapman Hall 
  38. Unwin D J, Fisher P 2005 Re-presenting Geographical Information Systems, Chichester John Wiley and Sons. 
  39. Yamada I and Rogerson P A 2003 An empirical comparison of edge effect correction methods applied to K-function analysis. Geographical Analysis 35: 97–109 
  40. Yamada I and Thill J 2004 Comparison of planar and network K-functions in traffic accident analysis. Journal of Transport Geography 12: 149–58 
  41. Yamada I and Thill J 2007 Local indicators of network-constrained clusters in spatial point patterns. Geographical Analysis 39: 268–92 
  42. Yeats M H and Garner B J 1976 The North American City. New York, Harper and Row 
  43. Williamson D, McLafferty S, Goldsmith V, Mollenkopf J, and McGuire P 1999 A better method to smooth crime incident data. ESRI ArcUser Magazine (January–March 1999) (available at http://www.esri.com/news/arcuser/0199/crimedata.html)